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Abstract 


Techniques are developed for approximation and exact computation of the asymptotic limit of the 
item parameter estimates obtained by application of joint maximum-likelihood estimation to the 
Rasch model. 
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For the Rasch model (Rasch, 1960) for binary responses, estimation of person and item 
parameters via joint maximum likelihood remains common despite consistency problems that have 
been known for a long period of time (Andersen, 1973) and despite the fact that these consistency 
problems disappear if conditional maximum likelihood is employed. To some extent, this practice 
appears to reflect the existence of American commercial software such as Winsteps that performs 
joint estimation, although Winmira and Conquest are example of readily available commercial 
software for conditional maximum likelihood. To some extent, the user of joint estimation may 
be influenced by the fact that bias problems decrease as the number of items becomes large 
(Haberman, 1977, 2004). In this report, some tools are provided for bias assessment. 

Section 1 summarizes known results concerning bias. Section 2 provides an approximation 
for bias in the case in which the variability of item parameters is small. Section 3 provides a 
general approach to computation of bias for a given set of item parameters and a given ability 
distribution. Section 4 considers consequences of the results of this report when joint estimation 
is applied to equating. Section 5 provides some concluding observations. 

1 Asymptotic Limits for Item Parameters 

In this section, the basic limiting behavior of maximum-likelihood estimates is considered for 
the binary Rasch model (Andersen, 1973; Fischer, 1981; Haberman, 1977, 2004). Results in this 
section are all known. Let X^ , 1 < i < n, 1 < j < q, be binary random variables with values 0 
or 1, such that X t j represents a response of an examinee i to an item j, with Xjj equal to 1 for a 
correct response and equal to 0 otherwise. Let q > 2, let X, be the (/-dimensional vector of Xij, 

1 < j < (/, and let 0j, 1 < i < n, be associated real random variables that are independent and 
identically distributed. Let the pairs (Xj, #;), 1 < i < n, be mutually independent, and let the X t j, 
1 < j < Q, be conditionally independent given #*. For real x and y, let P(x,y) = [l + exp(y — x)]~ l . 
Let pf, 0 < k < q, be the probability that the examinee sum S) = ^2j=i X \j is k , let pj, 1 < j < q, 
be the probability that X^ = 1, and let rrikj, 0 < k < q, 1 < j < q, be the conditional probability 
that response Xij = 1 given that the sum Si = k. The distinct feature of the Rasch model is 
that, for unknown real parameters (3j, 1 < j < q, the conditional probability that X^ = 1 given 
0i is P(0i,[3j). To permit parameter identification, let (3\ be assumed to be 0. To assist in joint 
maximum-likelihood estimation, it is helpful to consider infinite numbers. Adopt the convention 
that 1/[1 + exp( 2 )] is 1 for z = —00 and 0 for z = 00 . Let y — x be 00 for y = 00 and x < 00 or for 
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y > —oo and x = — oo, let y — x be —oo for y = —oo and x > —oo or for y < oo and x = —oo, and 
let y — x = Oiix = y = oooix = y= — oo. 

In joint maximum-likelihood estimation, estimation proceeds as if the 9i were fixed 
parameters. The joint log likelihood function 

n q 

log P(a,i,bj) + (1 - Xij) log[l - P(ai,bj)]} 

i=l j =1 

for a, an n-dimensional extended real vector with coordinates a,i, 1 < i < n, and b, a ^-dimensional 
extended real vector with coordinates bj, 1 < j < y, &i = 0. The convention is used that OlogO = 0. 
Let £m be the supremum of £(&, b) for n-dimensional extended real a and y-dimensional extended 
real b such that b± = 0, and let Jm be the set of pairs (a, b) such that £(a, b) = £m- 

One may then define extended maximum-likelihood estimates 6 of the ability vector 6 with 
coordinates the examinee abilities 9i, 1 < i < n, and extended maximum-likelihood estimates 0 
of the vector (3 with coordinates the item difficulties f3j, 1 < j < q, so that (9,0) is in Jm if Jm 
is nonempty, the initial coordinate j3\ is 0, and 6 and 0 are determined by the observed n by q 
data matrix X with row i and column j equal to X ty Note that if Jm is nonempty, b\ = 0, and 
£(a, b) = £mi then 9 = a and 0 = b. 

As the sample size n increases, 0 converges with probability 1 to a real vector 7 with 
coordinates 7 j, 1 < j < q, such that 71 = 0. For some unique extended real 9ks, 0 < k < q, 

q 

J^PkHOkS^j) =Pj , 1 <3 <Q, (1) 

k=0 

and 

q 

P ( e kS, Kj) = k, 0 < k < q. ( 2 ) 

3 = 1 

Equations (1) and (2), together with the constraint that 71 = 0, uniquely determine 7 . By (2), 
9qs = — 00 , 9 q s = 00 , and 9^s is finite for 1 < k < q — 1 . 

The basic challenge with joint estimation is that 7 and 0 need not be the same, so that joint 
estimation is asymptotically biased. For example, if q = 2, then 7 = 2/3 (Andersen, 1973, pp. 
66-69). Whenever 7 and 0 differ, an inconsistency problem exists. The size of the inconsistency 
is considered in sections 2 and 3. 

It should be emphasized that in conditional maximum-likelihood estimation, the problem of 
asymptotic bias is not present. In conditional likelihood, inferences are conditional on the total 
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score Si. If T is the set of (/-dimensional vectors x with coordinates Xj equal to 0 or 1 for 1 < j < q 
and if T(fc), 0 < k < q, is the set of x in T such that the sum of the coordinates is x j = k, 

then the conditional likelihood function £c(t>) is defined for (/-dimensional real vectors b such that 
6 i = 0 by 

n q 

e c (b) = J2 - , 

i =1 _ 3 =1 

where 

<? 

T(b,k) = ex P ~^2 b 3 x 3 

xer(fc) [ j=1 

for 0 < k < q. Let the supremum of ic be Icm, and let Jc be the set of (/-dimensional real 
vectors b such that b± = 0 and £c( b) = £cm■ Let the conditional maximum-likelihood estimate 
(3 C be a function of the observed Xjj, 1 < i < n, 1 < j < q, such that £c{0c) = ^CM whenever 
Jc is nonempty. Let coordinate j of j3 c be $jc, 1 < j < q- Then (3 C converges with probability 
1 to (3 (Andersen, 1973, chapter 5). Extended real versions of 0q can be considered, but they 
are relatively complicated to describe and of less practical importance than in joint maximum 
likelihood. Consequently they are not considered here. 

2 The Case of Nearly Equal Item Parameters 

Some basic analysis may be conducted by consideration of the case of nearly equal item 
parameters. One fixes the distribution of the 9i and lets the vector f3 approach the vector 0 with 
all q coordinates equal to 0. The implicit function theorem is then applied to (1) and (2). The 
following result is obtained for the maximum norm |b| = maxi<j< ? \bj\ defined for ?r-dimensional 
real vectors b. 

Theorem 1 For each real 5 > 0 a real e > 0 exists such that |q j — qf3j/(q — 1)| < 5\f3\ whenever 
\(3\ < e. 

This theorem, which is proven in the appendix, provides some formal basis for attempts to 
correct bias in f3j by multiplication by (q — l)/q (Jansen, Wollenberg, & Wierda, 1988; Wollenberg, 
Wierda, & Jansen, 1988; Wright, 1988; Wright & Douglas, 1977). For q = 2, the result holds with 
5 = 0 and e, an arbitrary real number. The result is consistent with the general observation that, 
for any integer q > 2 , if 6i is a bounded random variable, then, for any real constant c > 0 , a real 
d > 0 exists such that (7 — (3\ < d/q whenever \(3\ < c (Haberman, 2004). 
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3 Computation of Asymptotic Limits 


For a given distribution of and for a given (3, the asymptotic limit 7 can be computed with 
little difficulty by exploiting some techniques for efficient computation of probabilities of sums of 
independent Bernoulli random variables (Haberman, 2004). Computations also exploit standard 
methods to calculate maximum-likelihood estimates for logit models. The basic observations are 
the following. The value of 7 can be found by maximizing the expected log likelihood E(£( a, b)) 
for extended real vectors a of dimension n and extended real vectors b of dimension q subject to 
the constraint that b\ = 0 and, for some al, 0 < k < q, a* = a' k for any examinee i for whom 
Si = k. Given that a' 0 = —00 and o! q = 00 , 

q-l 

E ( a i b ) = p k \- m kj lo g p ( a k> b j) + (! - m kj) log[l - P(a' k , bj )]. 

k=1 

The maximum is achieved for a' k = 6sk and bj = 7 j , subject to the constraint that 71 = 0. 

This maximization can be accomplished by any standard computer package for calculation of 
maximum-likelihood estimates for logit models. For each real a, define independent Bernoulli 
random variables Uj(a), 1 < j < q, so that Uj(a) = 1 with probability P(a,(3j). Then the product 
p k m kj is the expectation of gkj{0i), where gkj{a ), a real, is P(a,/3j ) times the probability that 
Ylh =1 h^j Uh{a) = k — 1. Similarly, pf is the expectation of h k (0i), where hk{a), a real, is the 
probability that J2j=i E j( a ) = k- It follows that pf( 1 — m k j ) is the expectation of h k (0i) — gkj(0i). 
Computation of gkj{a) and hk(a) can be achieved by a recursive algorithm (Haberman, 2004). 

A Fortran 95 program was constructed to find values of 7 . Several examples are helpful to 
illustrate application of Theorem 1. If q = 5 and 9i is 1, 2, or 3 with respective probabilities 0.3, 
0.4, and 0.5 and if /3j = j — 1 for 1 < j < 5, then Theorem 1 suggests that 7 j = 1.25 (j — 1), so 
that 71 = 0, 72 = 1.25, 73 = 2.5, 74 = 3.75, and 75 = 5. In fact 71 = 0, 72 = 1.451, 73 = 2.793, 

74 = 4.075, and 75 = 5.295. Thus the approximation of Theorem 1 is only moderately accurate 
in this case, which involves f3j that are not close to 0. The approximation is relatively accurate 
if / 3j = (j — 1)/10, for the predicted values are 71 = 0, 72 = 0.125, 73 = 0.25, 74 = 0.375, and 

75 = 0.5. Actual values are 71 = 0, 72 = 0.124, 73 = 0.249, 74 = 0.374, and 75 = 0.500. Thus the 
theorem can be helpful with small |/3|, and the theorem is not especially helpful with large \(3\. 

The accuracy of the approximation suggested by the theorem can be somewhat worse than 
in the previous examples if the range of the /3j is increased. Consider the same distribution of 
0i for /?2 = —4, /% = —2, /?4 = 2, and /% = 4. In this case, 72 = —5, 73 = —2.5, 74 = 2.5, 
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and 75 = 5 is anticipated, but the actual values are 72 = —5.472, 73 = —3.013, 74 = 3.263, and 
75 = 6.382. Although the theorem may not provide a fully satisfactory approximation, it should 
be emphasized that 7 is certainly poorly approximated by (3 in numerous cases. For instance, in 
the last example, I 7 — 1.25/3| is 1.382, but (7 — /3\ is 2.382. 

Although it is clearly helpful for the number of items to increase, it should be emphasized 
that the difference between 7 and /3 can be large for numbers of items encountered in tests. For 
the approximation of Theorem 1, accuracy results for the same distribution of 9i but for q = 16 
and /3j = 0.3(j — 1) are relatively satisfactory but certainly not precise, for I 7 — (16/15)/3| is 0.082 
and I 7 — (3\ is 0.346. If q is 41, if 9 t is —2, 0, or 2 with respective probabilities 0.3, 0.4, and 0.3, 
if /3j = 0 . 2 (j — 22 ) for 2 < j < 21 , and if (3j = 0 . 2 (j — 21 ) for 22 < j < 41, so that j3j are placed 
on evenly spaced points from —4 to 4, then I 7 — 1.025/3| is 0.054 and I 7 — (3 | is 0.155. Numerous 
alternative cases can be considered by variation of the distribution of 9, . variation of the number 
q of items, or variation of the item difficulties /3j. Particularly severe problems can be found for 
9i equal to 5 with probability 1, /3j equal 0 for j < q/2 and (3j = 10 otherwise. For instance, for 
q = 20, I 7 - (3\ is 4.883 and (7 - (20/19)/3| is 4.356. 

4 Joint Maximum Likelihood and Equating 

To interpret results in sections 2 and 3, the practical effect of bias may be considered in terms 
of equating. For this purpose, consider equating of the results for the Xij to a reference form 
with q' > 2 items and n' examinees. Let items 1 to r > 1 be common, where r is less than both 
q and q'. Let X-j, 1 < i < n', 1 < j < q', be binary random variables with values 0 or 1, such 
that X-j represents a response on the reference form of an Examinee i to an Item j, with X[- 
equal to 1 for a correct response and equal to 0 otherwise. Assume that the examinees for the two 
forms are distinct. Let X' be the (/-<Ii mens kmal vector of X [-, 1 < j < q', and let 9[, 1 < i < n', 
be associated real random variables that are independent and identically distributed. Let the 
pairs (X), 0'), 1 < i < n, be mutually independent, and let the Ah, 1 < j < q', be conditionally 
independent given #*. For unknown real parameters (3j, 1 < j < (/, let the conditional probability 
that X[- = 1 given 9[ be P(9' i ,Pj), and let /3[ = 0. Note that if the common items perform as 
common items, then it should be true that f3j = /3j for 1 < j < r, but the analysis here only will 
use a restraint that the arithmetic mean /j of the /3j, 1 < j < r, is the same as the arithmetic 
mean 7 / of the (3j, 1 < j < r. Other approaches can be considered that lead to relatively similar 
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results. The approach selected is the simplest to apply without changing procedures for parameter 
estimation in some fashion. 

Let the joint maximum-likelihood estimate of the vector (3 of /3j, 1 < j < q , be (3 , and 
let the conditional maximum-likelihood estimate of j3' be (3q, so that, as n! becomes large, 

(3 converges with probability 1 to a limit -y and (3q converges with probability 1 to (3 . If fi 
denotes the arithmetic mean of the (3j for 1 < j < r, ft' denotes the arithmetic mean of the /3j for 
1 < j < r, fiQ denotes the arithmetic mean of the (3jc for 1 < j < r, £i' c denotes the arithmetic 
mean of the (3j C for 1 < j < r, v denotes the arithmetic mean of the jj, 1 < j < r, and v' denotes 
the arithmetic mean of the yj, 1 < j < r , then the probability is 1 that fi converges to v, fi' 
converges to z/, fic converges to /z, and fi' c converges to = /i. 

Consider equating by true scores. For the vector b of item parameters associated with the 
Xij, the test characteristic curve is 

<? 

= ^2 p {a,bj). 

3 =1 

Similarly, for the vector b' of item parameters associated with the XL, the test characteristic 
curve is 

q! 

V'(a,b') = ^P(a,b' j ). 

3 = 1 

For b and b' real vectors, both q^ 1 V(a, b) and (q')~ 1 V'(a, b) are strictly increasing continuously 
differentiable functions with infimum 0 and supremum 1 . Thus the equation V(a, b) = k has a 
unique solution for 0 < k < q if all bj are finite. 

To illustrate equating, consider the following equating procedures for a conversion from a 
total score Si to a total score XL. With joint maximum likelihood, for each total score 

Si = k, 0 < k < q, let 6 ks satisfy 

v(e kS ,p c ) = k, 

so that 8 k s is the joint maximum-likelihood estimate of 0 * for any examinee i for whom S{ = k. 
With probability 1, 9 k s converges to 9 k g (Haberman, 2004). Let the conversion from Si = k to S'- 
be 

W k = T'0 kS + p,'-ii,p'). 

With conditional maximum likelihood, for each total score Si = k, 0 < k < q, let 8 k c satisfy 

V(9 kC ,P) = k, 
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so that 6 kc is an estimate of 9 t for Sj = k that is based on conditional maximum likelihood. Let 


V(6 kC ,P) = k. 

With probability 1, 6 k c converges to Okc (Haberman, 2004). Let the conversion from Si = k to S'- 
be 

Wkc = V\9kc + Ac ~ Ac, Pc)- 

These definitions are not without complications if some estimates of item parameters are infinite, 
but these problems can be ignored for the purpose of derivation of large-sample results. 

Let 

W k = V'(6 kS + i/-v, 7 ') 

and 

w kC = v’(e kC ,p'). 

It is easily shown that the probability is 1 that W k converges to W k and W k c converges to 
W k c■ Here W k c is the conversion that results if all item parameters are unknown, so it may be 
regarded as the correct conversion. The issue is the extent that W k and W k c differ. Only the case 
l<fc<g — lisof interest, for Wq = Woe = 0 and W q = W q c = q' ■ 

If \P\ and \P'\ are small, then Theorem 1 and the implicit function theorem may be applied. 
Let P be the arithmetic mean of Pj for 1 < j < q, and let P' be the arithmetic mean of /?'• for 
1 Pi j < q'■ The following results are obtained. For any real 5 > 0, real e > 0 can be found so that 
if \P\ < e and \P'\ < e, then the following relationships hold: 

\0 k s - log [k/(q - k)} -P\< S\P \, 

\®kC ~ log[fc/(<Z - k)] - [q/(q- 1)}P\ < S\P\, 

W - [q/(q - i)]mI < $\Pl 

W-W/W-iM <s\f?\, 

I W kC - kq'/q - q\k/q)( 1 - k/q)(P - P')\ < 5(\P\ + |/3'|), 

and 

\W k -W kC -B\<5(\P\ + \p'\), 
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where 


B = q'(k/q)( 1 - k/qW ~ A*)/(? ~ 1) ~ ~ - 1)] 

is the approximate bias in the conversion from use of joint maximum likelihood. For the case of 
forms of equal length (q = q'), B = (k/q)( 1 — k/q)[q/(q — 1 )\(J3 — (3') thus depends on the difference 
(3 — p in the average item parameters for the two forms. The suggestion is that bias is most likely 
to be a problem when (3 and p are somewhat different and when k is about q/2. 

To illustrate results, consider q = q 1 = 20, k = 10, and (3 — (3' = 1. Then the bias 
approximation B is 0.2625. For the specific case of / 3j = /?'• = 0 for 1 < j < r = 10, / 3j = 1 
for 11 < j < 20, (3j = —1 for 11 < j < 20, and 9\ and 9[ both uniformly distributed on the 10 
points (h — 5.5)/10 for 1 < h < 10, 6ks is 0.5264, O^c is 0.5, is 14.58, Wj^c is 14.40, and 
the actual asymptotic bias for the conversion at k is 0.1771. If f3j is changed to 0.1 for j > 10 
and (3j is changed to —0.1 for j > 10, then the new approximation B = 0.02625 is somewhat 
better, for the correct asymptotic bias is then 0.02618. The bias approximation B can be rather 
inaccurate for larger values of \(3\ and \(3'\. For example, for q = 20, k = r = 10, (3j = /?'■ = 0 for 
1 < j < 10, /3j = 2 and /3j = — 2 for 11 < j < 20, and 6i = Q' i = 0 with probability 1, B is 0.523, 
but 9ks = 1-057, 9kc = 1) IFfe = 17.02, = 16.84, and the asymptotic bias for the conversion 

is 0.182. On the other hand, a change of k to 3 leads to B equals 0.268, but 9ks = —1.049, 

9kc = —1-069, W k = 10.03, and Wkc = 9.73. The asymptotic bias is then 0.302. 

Rather extreme examples can be constructed. Consider q = q' = 20, r = 10, (3j = /?' = 0 for 
1 <3 < 10, and (3j = (3j = 10 for 11 < j < 20. Thus the forms are identical in behavior. Let 
9i = 5 with probability 1, and let 9\ = 10 with probability 1. Consider k = 15. Then 9kc = 14.882, 
Wkc = 15, and Wf. = 19.90. Thus the conversion based on joint maximum likelihood is remarkably 
distant from the correct conversion for this value of k. The result is especially striking given that 
the forms do not differ in terms of item parameters. 

At the other extreme, it is obviously true that no asymptotic bias exists if 9i and 9\ have the 
same distribution and fij = /3'- for 1 < j < q. 

To interpret these results requires some consideration of what is an unacceptable size for 
a bias. One criterion involves mean-squared error. By this standard, any asymptotic bias is 
eventually unacceptable when estimation without asymptotic bias is available. This standard 
is relevant to use of joint estimation, for v}l 2 (J3 — 7 ) and {n! ) 1 / 2 (/3 / - V)> ™ 1 / 2 (/3c - (3), 



(n') 1/2 ((/3c - /3c), n 1 / 2 (4s - O kS ), and n l / 2 (6kC ~ @kc), 1 < < 9 — 1 all have approximate 

normal distributions for n and n' large (Haberman, 1977, 2004). By standard large-sample 
theorem, constants b k , b' k , b k c , and b' kC exist such that (W k — W k )/{b k /n + b^/n') 1 ^ 2 and 
(1 Vic ~ W k c)/{bkc/ n + b' k c/n') 1 / 2 have approximate standard normal distributions for n and n' 
large. 

For one illustration, let a 2 (a,b) = P(a, 6)[1 — P(a,b)] for extended real a and b. For q large 
and j > 1, n 1/,2 (/3j — /3j ) has asymptotic variance of approximately 

l/E(a 2 (0 i ,O)) + l/F;(a 2 (^,/3 j ). 

For f3j = 1, n = 100, 000, and 0i = 0 with probability 1, the asymptotic standard deviation of (ij is 
about 

{[1 /cr 2 (0, 0) + l/cr 2 (0,1)]/100000} 1/2 = 0.0095, 
a value somewhat smaller than the typical asymptotic bias for / 3j. 

5 Conclusion 

The logical consequence of results in this report is that joint estimation is very difficult to 
justify for assessments of customary length. This conclusion can be awkward given that clients 
used to joint estimation may be reluctant to make changes; however, the biases involved are 
not negligible albeit not usually very large. In this report, no studied equating conversion had 
an asymptotic bias greater than about 0.3 except for the extreme case with /3j = (3j = 10 for 
11 < j < 20. The mapping of raw scores to scale scores on the reference form obviously affects 
the equating implications for reported scores, so implications for reported scales are not entirely 
predictable. Some change in reported scores can sometimes occur even for relatively small changes 
in conversions due to the effects of rounding, and such changes, if they occur, can normally be 
expected to all be in the same direction. If each raw score corresponds to a different scaled score 
and if simple rounding is used with the raw to raw conversions to establish a scale score, then a 
bias of about 0.3 is likely to result in a change in the reported scale for a W k c that is from 0.2 to 
0.5 greater than the nearest integer. The likelihood of a change is much lower in other cases. 

It should be noted that repeated use of equating may cause bias to be a more significant issue 
than with one equating, especially in cases such as vertical linking in which item parameters may 
vary considerably across all forms to be linked. A second obvious issue is that statistical inference 
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procedures such as approximate confidence intervals and tests of goodness of fit do not apply 
when asymptotic biases are present. Thus use of joint estimation prevents proper study of such 
basic issues as whether the Rasch model fits the data at all and how precisely item difficulties are 
known. Whether conditional or joint estimation is used, true-score equating with the Rasch model 
can not be readily justified if the Rasch model fits the data poorly. 

The most fundamental issue is that no statistical or computational reason exists at the 
present time not to use conditional maximum likelihood and to avoid entirely the problem of 
asymptotic bias. 
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Appendix 

Proof of Theorem 1 


Let r (k), 0 < k < q, be the set of g-dimensional vectors x with coordinates 0 or 1 such that 
the sum of the coordinates is k, so that T(k) has q\/[k\(q — A:)!] elements, and the conditional 
probability that Si = k given that 0 % = a is 

, S\e = ST' TT exp (Xj(a - (3j) 

Pk\a 11 1 + eX p(a — bj)' 

xer(fc) j=i 


Let Tj(k ) be P{9ks-,lj) for 1 < j < q and 0 < k < q. Then the conditional expected value of 
Tj(Si) given Si = k and 0* = a is 

q 

Qj ( a ) = J2 p ( 6kS ’^ p k\ e a ’ 

k =o 


and (1) reduces to the equation 


E{Q j {e i )) = E(P{di,Pj)), 1 <j<q. (Al) 

H (3 = 0, the g-dimensional vector with coordinates 0 , then the conditional distribution of 
Si given 9i = a is a binomial distribution with sample size q and probability P(a, 0), and pj is 
E{P{9i, 0)) for 1 < j < q. Equations (2 and Al) hold for 7 = 0, 9so = —oo, 9s q = oo, and 
9kS = log [k/(q — k)], 1 < k < q — 1, for Tj(k) is then k/q for 1 < j < q and 0 < k < q, and Qj(a) is 
P(a, 0) for 1 < j < q. 

The vector 7 is a continuously differentiable function of (3 (Haberman, 2004). Let jjh, 

1 < j < q, be the partial derivative of 7 j with respect to (3h-> and let 9kSh be the partial derivative 

of 9}-s, 1 < k < q — 1, with respect to (3h for 2 < h < q. Obviously the constraint 71 = 0 implies 

that 7 \h = 0. As in section 4, let cr 2 (a, b) = P(a , b) [1 — P(a, b )] for extended real a and b. Let 5 a b 

be 1 for a = b and 0 otherwise. Let Z t jk, 1 < i < n, 1 < j < q, 0 < k < q, be 1 for Xij = 1 and 

Si = k, and let Z VJ k be 0 otherwise. Let Y t k be 1 for 5) = k and 0 otherwise. Differentiation of 

(Al) and (2) shows that, for 2 < h < q, 

q -1 

X] a2 (°kS, Kj){QkSh - 7 jh)Pk 
k= 1 

q 

+ X P ( 0kS ' 71 )E( z ihk - Y ik P(9, (3 h )) 

k =1 

+ E(a 2 (9i,p j ))5 jh 

= 0 
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for 1 < j < q and 


1 

Y a2 ^kS,lj){0kSh ~ 7 jh) = 0 

3 =1 


for 1 < k < q — 1 . 

In the special case of (3 = 0, OkSh is the average q~ 1 J2j=i Ijh °f the 7 jh for 1 < k < q. For 
each integer k from 0 to q, the distribution of X,; given S t = k is symmetric. Thus E(Zihk) is 
kPj?/q. Given that P{0kSilj) = k/q, after division by E(a 2 (6i, 0)), standard results related to the 
binomial distribution yield the equation 


-- - f q 1 Y Ij'h - Ijh I +s jh -q 1 = 0 

q \ h ) 

for 2 < h < q. Given the constraint that 71 /, = 0 for 2 < h < q, it follows that 

Q 1 z 7 j=i Ijh = I/Cq — 1) and 7 jh = [q/(q — 1 The theorem follows from the definition of 

differentiability. 
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